Semi-Automatic Identification of Bilingual Synonymous Technical Terms from Phrase Tables and Parallel Patent Sentences
نویسندگان
چکیده
In the research field of machine translation of patent documents, the issue of acquiring technical term translation equivalent pairs automatically from parallel patent documents is one of those most important. We take an approach of utilizing the phrase table of a state-of-the-art phrase-based statistical machine translation model. In this task, we consider situations where a technical term is observed in many parallel patent sentences and is translated into many translation equivalents. We apply SVM to the task of identifying synonymous translation equivalent pairs and achieve almost 98% precision and over 40% Fmeasure. Then, in order to improve recall, we introduce a semi-automatic framework, where we employ the strategy of selecting more than one seeds for each set of candidates bilingual synonymous term pairs. By manually judging whether each pair of two seeds is synonymous or not, we achieve over 95% precision and 50% recall.
منابع مشابه
Evaluating Features for Identifying Japanese-Chinese Bilingual Synonymous Technical Terms from Patent Families
In the process of translating patent documents, a bilingual lexicon of technical terms is inevitable knowledge source. It is important to develop techniques of acquiring technical term translation equivalent pairs automatically from parallel patent documents. We take an approach of utilizing the phrase table of a state-of-theart phrase-based statistical machine translation model. First, we coll...
متن کاملIdentifying Japanese-Chinese Bilingual Synonymous Technical Terms from Patent Families
In the task of acquiring Japanese-Chinese technical term translation equivalent pairs from parallel patent documents, this paper considers situations where a technical term is observed in many parallel patent sentences and is translated into many translation equivalents and studies the issue of identifying synonymous translation equivalent pairs. First, we collect candidates of synonymous trans...
متن کاملCollecting Bilingual Technical Terms from Patent Families of Character-Segmented Chinese Sentences and Morpheme-Segmented Japanese Sentences
In manual translation of patent documents, a technical term bilingual lexicon is inevitable for a translator to efficiently translate patent documents. Dong et al. (2015) proposed a method of generating bilingual technical term lexicon from morpheme-segmented parallel patent sentences. The proposed method estimates Japanese-Chinese translation of technical terms using the phrase translation tab...
متن کاملCompositional Translation of Technical Terms by Integrating Patent Families as a Parallel Corpus and a Comparable Corpus
In the previous methods of generating bilingual lexicon from parallel patent sentences extracted from patent families, the portion from which parallel patent sentences are extracted is about 30% out of the whole “Background” and “Embodiment” parts and about 70% are not used. Considering this situation, this paper proposes to generate bilingual lexicon for technical terms not only from the 30% b...
متن کاملIntegrating a Phrase-based SMT Model and a Bilingual Lexicon for Human in Semi-Automatic Acquisition of Technical Term Translation Lexicon
This paper presents an attempt at developing a technique of acquiring translation pairs of technical terms with sufficiently high precision from parallel patent documents. The approach taken in the proposed technique is based on integrating the phrase translation table of a state-of-the-art statistical phrasebased machine translation model, and compositional translation generation based on an e...
متن کامل